Discrete Geodesics

  • In 03-shortest-curves-float32.ipynb, we have seen that using float32 instead of float64 can break the convergence of the ODE solver.
  • In 05-ode-stability.ipynb, we have seen that omitting the computationally expensive fun_jac parameter is likely to break the convergence of the ODE solver.
  • In 07-moons-rbf.ipynb, we have seen that solving the geodesics ODE is unstable on the moons dataset. The solver does not converge to a solution. Instead, it often diverges without finding even an approximate solution.

It is worth exploring alternative methods to find a geodesic in the latent space w.r.t. the Riemannian metric. Specifically, we will implement Algorithm 1 of Shao (2017), which uses gradient descent to find a piecewise linear curve at a local minimum of the curve length / energy functional.

Imports and plotting library setup

In [1]:
%load_ext autoreload
%autoreload 2
import numpy as np
import tensorflow as tf
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

# Set up plotly
init_notebook_mode(connected=True)
layout = go.Layout(
    width=700,
    height=500,
    margin=go.Margin(l=40, r=40, b=20, t=20),
    showlegend=False
)
config={'showLink': False}

# Make results completely deterministic
seed = 0
np.random.seed(seed)
tf.set_random_seed(seed)
/Users/kilian/dev/tum/2018-mlic-kilian/venv/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters

Visualize the generator's surface

We use the trained generator model from 04-shortest-curves-vae.ipynb. Note that this model only predicts the mean, while the variance was fixed to 0.1.

In [2]:
cone_generator = tf.keras.models.load_model('models/cone-generator.h5')
session = tf.keras.backend.get_session()
WARNING:tensorflow:No training configuration found in save file: the model was *not* compiled. Compile it manually.
In [3]:
from src.plot import plot_3d_surface

points_per_axis = 100
latent_x = np.linspace(-2, 2, points_per_axis)
latent_y = np.linspace(-2, 2, points_per_axis)
surface = plot_3d_surface(latent_x, latent_y, cone_generator, layout)

Plot the magnification factor heatmap

in the latent space together with the start and end point of our geodesic, i.e. the task.

In [4]:
from src.plot import plot_magnification_factor

# Values from 04-shortest-curves-vae.ipynb
z_start = np.array([-1.7773801, -0.99523455])
z_end = np.array([1.872246, 0.8469684])

task_plot = go.Scatter(
    x = np.array([z_start[0], z_end[0]]),
    y = np.array([z_start[1], z_end[1]]),
    mode = 'markers',
    marker = {'color': 'red'}
)
heatmap = plot_magnification_factor(session, latent_x, latent_y, 
                                    cone_generator, 
                                    additional_data=[task_plot], 
                                    layout=layout,
                                    log_scale=False)
Computing Magnification Factors: 100%|██████████| 10000/10000 [00:17<00:00, 569.09it/s]

Add evaluation metrics for a curve

Before implementing anything, we need a performance measure for a given curve. We are interested in its length and want curves with constant velocity in the VAE's input space. Therefore, we also measure the proportion between the curve's longest and its shortest segment. This corresponds to the proportion of the highest and the lowest velocity, since each segment corresponds to an equal interval in $t$ ($t$ parametrizes the curve by going from $0$ to $1$).

In [5]:
def evaluate(generator, latent_curve):
    input_curve = generator.predict(latent_curve)
    lengths = []
    for start, end in zip(input_curve[:-1], input_curve[1:]):
        length = np.linalg.norm(end - start)
        lengths.append(length)
        
    length = np.sum(lengths)
    velocity_ratio = np.max(lengths) / np.min(lengths)
    print('Length: %f, Max velocity ratio: %f' % (length, velocity_ratio))

Implement the Geodesic Path algorithm

We can use equation 5 from Shao (2017) to compute the gradient at each step. For this, we need to evaluate the Jacobian and the generator output itself at each point of the piecewise linear curve.

In [6]:
from src.util import get_jacobian_fast as get_jacobian

def shao(session, z_start, z_end, generator, num_nodes, save_every=10):
    # Initialize the curve
    nodes = np.linspace(0, 1, num_nodes)
    curve = z_start + np.outer(nodes, z_end - z_start)
    dt = 1 / (num_nodes-1)
    
    # Get a graph op for the jacobian
    generator_input = tf.expand_dims(generator.input[0], 0)
    generator_output = generator(generator_input)[0]
    jacobian = get_jacobian(generator_output, generator_input)
    jacobian = tf.squeeze(jacobian)
    
    learning_rate = 0.01
    epsilon = 0.001 * (num_nodes-2)
    
    # Save the curve regularly
    iterations = [np.copy(curve)]
    step = 0
    gradient_sum = 1
    while gradient_sum > epsilon:
        gradient_sum = 0
        for i in range(1, num_nodes-1):
            # Compute the update for point i
            jacobian_z = session.run(jacobian, feed_dict={
                generator.input: np.array([curve[i]])
            })
            g = generator.predict(curve[[i-1, i, i+1]])
            
            # This is equation 5in Shao (2017)
            gradient = -1 / dt * jacobian_z.T.dot(g[2] - 2 * g[1] + g[0])
            curve[i] -= learning_rate * gradient
            
            gradient_sum += np.sum(np.square(gradient))
            
        step += 1
        if step % save_every == 0:
            iterations.append(np.copy(curve))
    print('Finished after %d steps' % step)
    return curve, iterations

Find the geodesic path

and visualize the algorithm's iterations.

In [7]:
%%time
save_every = 20
curve, iterations = shao(session, z_start, z_end, cone_generator, 
                         num_nodes=50, save_every=save_every)
evaluate(cone_generator, curve)
print('-' * 20)
Finished after 401 steps
Length: 1.362392, Max velocity ratio: 4.920695
--------------------
CPU times: user 31.1 s, sys: 7.07 s, total: 38.2 s
Wall time: 22 s
In [8]:
from src.plot import plot_latent_curve_iterations

plot_latent_curve_iterations(iterations, [heatmap, task_plot], layout,
                             step_size=save_every)

Implement the algorithm using the metric tensor

For our purposes, we cannot use the algorithm above. For that, we would need to evaluate the generator's output at specific points. However, the generator is stochastic, which is why Arvanitidis (2017) define the Riemannian metric differently from Shao (2017). So, we can only use the metric tensor.

While Shao et al. compute the energy of each curve segment exactly via

$\dfrac{1}{\delta t} \Vert g(z_{i+1}) - g(z_{i}) \Vert^2$,

we need to approximate this by computing

$\dfrac{1}{\delta t} \dot{\gamma}(t)^T G_{\gamma(t)} \dot{\gamma}(t)$,

similar to the intergral the energy functional of equation 2 in Shao (2017). This only depends on the velocity $\dot{\gamma}(t)$ and the metric $G_{\gamma(t)}$, which we both can compute.

We can compute both the energy and the length of a curve given the mean generator and the std generator in TensorFlow:

In [9]:
from src.util import get_metric_batch_op

def get_energy_op(curve, mean_generator, std_generator=None):
    dt = 1 / tf.cast(tf.shape(curve)[0] - 1, curve.dtype)

    # Get the starts and ends of the curve's segments
    starts = curve[:-1]
    ends = curve[1:]
    velocities = (ends - starts) / dt
    
    # Evaluating the metric in the middle of each segment gives the best
    # estimates
    middles = starts + 0.5 * (ends - starts)
    metrics = get_metric_batch_op(middles, mean_generator, std_generator)

    # energy = dt * 0.5 * velocity dot metric dot velocity
    energies = tf.matmul(tf.expand_dims(velocities, 1), metrics)
    energies = tf.matmul(energies, tf.expand_dims(velocities, 2))
    energies = 0.5 * dt * energies
    energy = tf.reduce_sum(energies)

    return energy


def get_length_op(curve, mean_generator, std_generator=None):
    dt = 1 / tf.cast(tf.shape(curve)[0] - 1, curve.dtype)

    # Get the starts and ends of the curve's segments
    starts = curve[:-1]
    ends = curve[1:]
    velocities = (ends - starts) / dt
    
    # Evaluating the metric in the middle of each segment gives the best
    # estimates
    middles = starts + 0.5 * (ends - starts)
    metrics = get_metric_batch_op(middles, mean_generator, std_generator)

    # length = dt * sqrt(velocity dot metric dot velocity)
    lengths = tf.matmul(tf.expand_dims(velocities, 1), metrics)
    lengths = tf.matmul(lengths, tf.expand_dims(velocities, 2))
    lengths = dt * tf.sqrt(lengths)
    length = tf.reduce_sum(lengths)

    # Also return the maximum ratio between lengths of segments
    return length, tf.reduce_max(lengths) / tf.reduce_min(lengths)

Now that we have ops for computing the length and energy of a curve, we can create the curve as a variable in TensorFlow and minimize the energy.

In [10]:
def find_geodesic_discrete(session, z_start, z_end, mean_generator,
                           std_generator=None, num_nodes=20, save_every=10,
                           learning_rate=0.001, max_steps=200, log_every=20,
                           reg=0.0):
    dtype = mean_generator.input.dtype
    latent_dim = mean_generator.input.shape.as_list()[1]

    # Initialize the curve
    nodes = np.linspace(0, 1, num_nodes)
    curve_init = z_start + np.outer(nodes, z_end - z_start)

    # Create the curve variables
    curve_start = tf.constant(z_start, dtype=dtype, shape=[1, latent_dim])
    curve_end = tf.constant(z_end, dtype=dtype, shape=[1, latent_dim])
    curve_middle = tf.Variable(curve_init[1:-1], dtype=dtype)
    curve = tf.concat([curve_start, curve_middle, curve_end], axis=0)

    # Log the curve length regularly
    length, vel_ratio = get_length_op(curve, mean_generator, std_generator)

    # Optimize the energy
    energy = get_energy_op(curve, mean_generator, std_generator)
    loss = energy
    if reg > 0:
        loss += reg * vel_ratio
    
    optimizer = tf.train.AdamOptimizer(learning_rate)
    optimize = optimizer.minimize(loss, var_list=[curve_middle])

    # Initialize the graph variables
    session.run(curve_middle.initializer)
    session.run([v.initializer for v in optimizer.variables()])

    # Log the stats for the initial euclidean interpolation
    iterations = [session.run(curve)]
    current_length, current_energy, current_ratio = session.run(
        [length, energy, vel_ratio])
    print('Step 0, Length %f, Energy %f, Max velocity ratio %f' %
          (current_length, current_energy, current_ratio))
    
    # Optimize the curve energy
    for step in range(1, max_steps + 1):
        session.run(optimize)

        if step % save_every == 0:
            iterations.append(session.run(curve))

        if log_every > 0 and step % log_every == 0:
            current_length, current_energy, current_ratio = session.run(
                [length, energy, vel_ratio])

            print('Step %d, Length %f, Energy %f, Max velocity ratio %f' %
                  (step, current_length, current_energy, current_ratio))
    return session.run(curve), iterations

Minimize the curve energy

The algorithm in Shao updates each point of the curve seperately while keeping the other points fixed. This would cause a large graph in TensorFlow, since we would have to create a separate tf.train.AdamOptimizer for each curve point.

We can see that updating all curve points in the same optimizer step works as well and takes less time to compute than the original algorithm above (600 vs. 18 complete curve updates per second). Therefore, we can run it for a higher number of steps yielding a sligthly shorter curve with steady velocity (1.1 vs 3.8 velocity ratio).

We can see that here minimizing the energy of the curve leads to a steady velocity, as stated in Shao (2017).

In [11]:
%%time
curve, iterations = find_geodesic_discrete(session, z_start, z_end, 
                                           cone_generator, num_nodes=100,
                                           learning_rate=0.01,
                                           max_steps=2000,
                                           save_every=100,
                                           log_every=200)
evaluate(cone_generator, curve)
print('-' * 20)
Step 0, Length 2.231040, Energy 5.582327, Max velocity ratio 19.032017
Step 200, Length 1.380684, Energy 0.971520, Max velocity ratio 1.596776
Step 400, Length 1.348779, Energy 0.915848, Max velocity ratio 1.264491
Step 600, Length 1.346573, Energy 0.909600, Max velocity ratio 1.199774
Step 800, Length 1.346529, Energy 0.907822, Max velocity ratio 1.140973
Step 1000, Length 1.346350, Energy 0.906692, Max velocity ratio 1.081519
Step 1200, Length 1.346332, Energy 0.906432, Max velocity ratio 1.046661
Step 1400, Length 1.346528, Energy 0.907053, Max velocity ratio 1.143572
Step 1600, Length 1.346333, Energy 0.906318, Max velocity ratio 1.019385
Step 1800, Length 1.346557, Energy 0.906994, Max velocity ratio 1.117077
Step 2000, Length 1.346341, Energy 0.907182, Max velocity ratio 1.224939
Length: 1.346610, Max velocity ratio: 1.224873
--------------------
CPU times: user 10.3 s, sys: 2.95 s, total: 13.2 s
Wall time: 4.58 s
In [12]:
plot_latent_curve_iterations(iterations, [heatmap, task_plot], layout,
                             step_size=100)

Moons

After getting the discrete algorithm to work for the stochastic Riemannian metric, we can also try it out on the moons dataset, where the ODE solver failed to converge (see 07-moons-rbf.ipynb).

First, we load the generator from the VAE trained on the moons dataset. It also estimates the variance of the reconstructions using the RBF layer described in Arvanitidis (2017).

In [13]:
from tensorflow.python.keras import Model
from tensorflow.python.keras.layers import Lambda
from src.util import wrap_model_in_float64
from src.rbf import RBFLayer

# Load the model
with tf.keras.utils.CustomObjectScope({'RBFLayer': RBFLayer}):
    moons_generator = tf.keras.models.load_model('models/moons-generator.h5')
    
# Get the mean and std predictors
_, mean, var = moons_generator.output
sqrt_layer = Lambda(tf.sqrt)
dec_mean = Model(moons_generator.input, mean)
dec_std = Model(moons_generator.input, sqrt_layer(var))

# Wrap the models in float64, since the std jacobian can become very large 
# with rbf kernels
dec_mean = wrap_model_in_float64(dec_mean)
dec_std = wrap_model_in_float64(dec_std)
WARNING:tensorflow:No training configuration found in save file: the model was *not* compiled. Compile it manually.

Visualize a heatmap of magnfication factors

together with the two points between which we want to find the geodesic.

In [14]:
z_start = np.array([0.13, 1.28])
z_end = np.array([0.63, -0.42])
task_plot = go.Scatter(
    x = np.array([z_start[0], z_end[0]]),
    y = np.array([z_start[1], z_end[1]]),
    mode = 'markers',
    marker = {'color': 'red'}
)
In [15]:
heatmap_z1 = np.linspace(-4, 4, 200)
heatmap_z2 = np.linspace(-4, 4, 200)
heatmap = plot_magnification_factor(session, 
                                    heatmap_z1,
                                    heatmap_z2, 
                                    dec_mean, 
                                    dec_std, 
                                    log_scale=True,
                                    scale='hotcold',
                                    additional_data=[task_plot],
                                    layout=layout)
Computing Magnification Factors: 100%|██████████| 40000/40000 [02:11<00:00, 304.88it/s]

Compute the discrete geodesic

In [16]:
%%time
from src.discrete import find_geodesic_discrete

curve, iterations = find_geodesic_discrete(session, z_start, z_end, 
                                           dec_mean, std_generator=dec_std,
                                           num_nodes=100,
                                           learning_rate=0.001,
                                           max_steps=5000,
                                           save_every=500,
                                           log_every=500)
print('-' * 20)
Step 0, Length 47.962304, Energy 3742.484118, Max velocity ratio 361.783694
Step 500, Length 4.170051, Energy 14.716824, Max velocity ratio 47.463373
Step 1000, Length 2.882779, Energy 10.927646, Max velocity ratio 86.813891
Step 1500, Length 2.576390, Energy 9.759809, Max velocity ratio 224.156816
Step 2000, Length 2.609266, Energy 9.691837, Max velocity ratio 275.618253
Step 2500, Length 2.447081, Energy 9.378779, Max velocity ratio 177.935167
Step 3000, Length 2.428816, Energy 9.240455, Max velocity ratio 100.593489
Step 3500, Length 2.485026, Energy 9.331544, Max velocity ratio 282.055812
Step 4000, Length 2.356686, Energy 9.142024, Max velocity ratio 183.369208
Step 4500, Length 2.328911, Energy 9.006500, Max velocity ratio 654.994678
Step 5000, Length 2.352256, Energy 8.984770, Max velocity ratio 801.438125
--------------------
CPU times: user 56.8 s, sys: 12.8 s, total: 1min 9s
Wall time: 21.2 s
In [17]:
plot_latent_curve_iterations(iterations, [heatmap, task_plot], layout,
                             step_size=500)

This is clearly not the geodesic and minimizing the energy does not lead to a steady velocity this time. We can see that the problem here is the initialization of the curve. Since the discrete geodesic algorithm does gradient descent w.r.t. the curve energy, it is hard for it to find the true geodesic. For this, it would have to go over the hill of high variance to reach the valley on the right.

Bezier curves as inits

It is worth noting that this is a synthetic dataset and it is not clear, whether we will experience such problems on MNIST as well, for example. Nevertheless, we can tackle the problem here by using different curve intializations, for example using Bezier curves:

In [18]:
import bezier

bezier_coefficients = [-1, 0, 1]
plot_data = []
for coeff in bezier_coefficients:
    direction = (z_end - z_start)
    # Construct the third point of the Bezier polygon by going orthogonally
    # from the middle point
    middle = z_start + 0.5 * direction
    orthogonal = np.array([- direction[1], direction[0]])
    
    # Construct the Bezier curve
    nodes = np.array([
        z_start,
        middle + coeff * orthogonal,
        z_end
    ]).T
    curve = bezier.Curve(nodes, degree=2)
    eval_curve = curve.evaluate_multi(np.linspace(0, 1, 50)).T

    # Plot the Bezier curve
    curve_plot = go.Scatter(
        x=eval_curve[:, 0],
        y=eval_curve[:, 1],
        mode='lines+markers',
        line={'width': 1, 'color': '#00DD00'},
        marker={'size': 5, 'color': '#00DD00'}
    )
    plot_data.append(curve_plot)
data = [heatmap, task_plot] + plot_data
iplot(go.Figure(data=data, layout=layout), config=config)

The rightmost Bezier curve looks like a promising init. It was generated with coeff=1 in the bezier_coefficients above. Let's try it out:

In [19]:
%%time
from src.discrete import find_geodesic_discrete_bezier

curve, iterations = find_geodesic_discrete_bezier(session, z_start, z_end, 
                                                  dec_mean, 
                                                  std_generator=dec_std,
                                                  num_nodes=100,
                                                  learning_rate=0.001,
                                                  max_steps=5000,
                                                  save_every=500,
                                                  log_every=500,
                                                  bezier_coeff=1.0)
print('-' * 20)
Step 0, Length 3.786491, Energy 12.535906, Max velocity ratio 18.976259
Step 500, Length 1.511449, Energy 1.688590, Max velocity ratio 8.598392
Step 1000, Length 1.518164, Energy 1.675852, Max velocity ratio 7.606424
Step 1500, Length 1.520147, Energy 1.671398, Max velocity ratio 7.486335
Step 2000, Length 1.520569, Energy 1.669353, Max velocity ratio 7.314313
Step 2500, Length 1.520612, Energy 1.668291, Max velocity ratio 7.147387
Step 3000, Length 1.520719, Energy 1.667851, Max velocity ratio 7.009013
Step 3500, Length 1.520857, Energy 1.667814, Max velocity ratio 6.910097
Step 4000, Length 1.520633, Energy 1.667369, Max velocity ratio 6.810045
Step 4500, Length 1.520457, Energy 1.667239, Max velocity ratio 6.759334
Step 5000, Length 1.520469, Energy 1.667231, Max velocity ratio 6.715039
--------------------
CPU times: user 58.4 s, sys: 13 s, total: 1min 11s
Wall time: 21.8 s
In [20]:
plot_latent_curve_iterations(iterations, [heatmap, task_plot], layout,
                             step_size=500)

Now, the algorithm converges to a small energy and small length while constantly increasing the steadiness of the velocity. The solution might not look like a geodesic, especially compared to the initial Bezier curve. However, note that when comparing the length, energy and maximum segment velocity ratio between the initial and the final curve, we can see that the solution clearly is closer to being a geodesic:

Initial Bezier curve:

  • Length: 3.7
  • Energy: 12.5
  • Max segment velocity ratio: 18.9

Final curve:

  • Length: 1.5
  • Energy: 1.6
  • Max segment velocity ratio: 6.7